import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline
import seaborn as sns
sns.set(font_scale=2)
sns.set_style("whitegrid")

Comparing countries with PCA¶

Now that we've looked at positions for general players, we can try and compare the players from two different countries. This may allow us to predict the winner of a match between two countries, during the World Cup for example.

We'll first compare Brazil, a perennial powerhouse, and Japan, a relative newcomer to professional football. We'll construct two datasets, one with goal-keepers, and one with "regular" players.

df = pd.read_csv("FIFA_2018.csv",encoding = "ISO-8859-1",index_col = 0, low_memory = False)

country_1 = 'Brazil'
country_2 = 'Japan'

D = df[df['Nationality'].isin([country_1, country_2])].copy()

D.head()

Construct two datasets, one with goal-keepers (name it D_gk), and one with "regular" players (name it D_reg). The dataset with regular players should have no goal-keeping statistics, and vice versa.

# clear
D_gk = D[D['Position'] == 'GK'].copy()
D_gk = D_gk[['GK diving', 'GK handling', 'GK kicking', 'GK positioning', 'GK reflexes',
            'Nationality']]

D_reg = D[D['Position'] != 'GK'].copy()
D_reg = D_reg.drop(['GK diving', 'GK positioning', 'GK handling', 
                    'GK kicking', 'GK positioning', 'GK reflexes'],1)

Now, we can once again subtract the mean, compute the SVD, and add the first two principal components as columns in the dataframes

# clear
X_reg = D_reg.iloc[:,:-4].copy()
X_gk = D_gk.iloc[:,:-1].copy()

A = X_reg - X_reg.mean()
B = X_gk - X_gk.mean()


U, S, Vt = np.linalg.svd(A, full_matrices = False)
V = Vt.T

u, s, vt = np.linalg.svd(B, full_matrices = False)
v = vt.T

D_reg['pc1'] = U[:,0]*S[0]
D_reg['pc2'] = U[:,1]*S[1]

D_gk['pc1'] = u[:,0]*s[0]
D_gk['pc2'] = u[:,1]*s[1]

We'll first compare the goalkeepers, by plotting the first two principal components (use the same lmplot code snippet from part 1). Since there are only 5 goalkeeper attributes, we can plot all attributes and see how the two countries stack up.

It appears that Brazilian goalkeepers have a clear advantage in handling, positioning, reflexes, and diving. Kicking is a little more even, but it still looks like Brazil has an advantage.

Furthermore, the best Brazilian goal-keepers seem to be much better than the best Japanese goal-keepers.

Now let's compare the forward players. You will need to first extract the dataset in which D_reg['Position']=='FWD'. Then plot the first two principal components and the projections for attributes [2,9,19,21, 24].

From here, it looks like Japan has many more below-average forwards than Brazil. Nearly all Japanese forwards have below average stamina and reaction, and Brazilian forwards are more likely to have stronger finishing, agility, and shot Power.

Compare the mid-fielder players. First extract the dataset in which D_reg['Position']=='MID'. Then plot the first two principal components and the projections for attributes [8,12,14,18,24].

Again, it seems that Brazilian forwards are more skilled. Even when Japanese mid-fielders are skilled defensive players (so that they are above-average in interceptions), their defensive-minded Brazilian counterparts do not have below-average grades in other skills.

Last, compare the defense players. First extract the dataset in which D_reg['Position']=='DEF'. Then plot the first two principal components and the projections for attributes [1,12, 14, 22,24, 26].

Finally, Japanese defenders are behind Brazilian defenders when it comes to important defensive attributes like interceptions, sliding tackles, aggression, and long passing.

It seems clear that Brazil tends to have more skilled football players than Japan, which should be of no surprise due to Brazil's decades of dominance in the sport. While they never played each other in the 2018 World Cup, it should be no surprise that Brazil finished with a better record, and advanced further in the final bracket.

You can now repeat the analysis for any two countries of your choice! Can principal component analysis explain any of the results from the last world cup? That is, was it obvious beforehand that France would beat Croatia in the final match? Are there any results that are surprising?

	Acceleration	Aggression	Agility	Balance	Ball control	Composure	Crossing	Curve	Dribbling	Finishing	...	Sprint speed	Stamina	Standing tackle	Strength	Vision	Volleys	Position	Name	Nationality	Club
2	94	56	96	82	95	92	75	81	96	89	...	90	78	24	53	80	83	FWD	Neymar	Brazil	Paris Saint-Germain
30	70	77	74	68	80	83	60	61	68	38	...	74	74	89	81	74	63	DEF	Thiago Silva	Brazil	Paris Saint-Germain
39	77	84	77	82	88	85	90	80	84	67	...	79	81	85	77	75	54	DEF	Marcelo	Brazil	Real Madrid CF
51	84	82	79	79	81	82	86	78	82	55	...	88	93	84	80	70	68	MID	Alex Sandro	Brazil	Juventus
54	88	55	92	92	88	85	77	84	88	74	...	77	80	44	61	87	75	MID	Coutinho	Brazil	Liverpool